security issue
SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning
Liu, Fang, Liu, Simiao, Zhu, Yinghao, Lian, Xiaoli, Zhang, Li
Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serves as an effective practice that enables developers to check their teammates' code before integration into the codebase. To streamline the generation of review comments, various automated code review approaches have been proposed, where LLM-based methods have significantly advanced the capabilities of automated review generation. However, existing models primarily focus on general-purpose code review, their effectiveness in identifying and addressing security-related issues remains underexplored. Moreover, adapting existing code review approaches to target security issues faces substantial challenges, including data scarcity and inadequate evaluation metrics. To address these limitations, we propose SecureReviewer, a new approach designed for enhancing LLMs' ability to identify and resolve security-related issues during code review. Specifically, we first construct a dataset tailored for training and evaluating secure code review capabilities. Leveraging this dataset, we fine-tune LLMs to generate code review comments that can effectively identify security issues and provide fix suggestions with our proposed secure-aware fine-tuning strategy. To mitigate hallucination in LLMs and enhance the reliability of their outputs, we integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge. Additionally, we introduce SecureBLEU, a new evaluation metric designed to assess the effectiveness of review comments in addressing security issues. Experimental results demonstrate that SecureReviewer outperforms state-of-the-art baselines in both security issue detection accuracy and the overall quality and practical utility of generated review comments.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
Shukla, Shivani, Joshi, Himanshu, Syed, Romilla
The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of "improvements" using four distinct prompting strategies. Our findings show a 37.6% increase in critical vulnerabilities after just five iterations, with distinct vulnerability patterns emerging across different prompting approaches. This evidence challenges the assumption that iterative LLM refinement improves code security and highlights the essential role of human expertise in the loop. We propose practical guidelines for developers to mitigate these risks, emphasizing the need for robust human validation between LLM iterations to prevent the paradoxical introduction of new security issues during supposedly beneficial code "improvements".
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.89)
The Download: introducing: the Security issue
An AI chatbot told a user how to kill himself--but the company doesn't want to "censor" it For five months, Al Nowatzki had been talking to an AI girlfriend, "Erin," on the platform Nomi. But earlier this year, those conversations took a disturbing turn: Erin told him to kill himself, and provided explicit instructions on how to do it. Nowatzki had never had any intention of following Erin's instructions--he's a researcher who probes chatbots' limitations and dangers. But out of concern for more vulnerable individuals, he exclusively shared with MIT Technology Review screenshots of his conversations and of subsequent correspondence with a company representative, who stated that the company did not want to "censor" the bot's "language and thoughts." This is not the first time an AI chatbot has suggested that a user take violent action, including self-harm. But researchers and critics say that the bot's explicit instructions--and the company's response--are striking.
MARVEL: Multi-Agent RTL Vulnerability Extraction using Large Language Models
Collini, Luca, Ahmad, Baleegh, Ah-kiow, Joey, Karri, Ramesh
Hardware security verification is a challenging and time-consuming task. For this purpose, design engineers may utilize tools such as formal verification, linters, and functional simulation tests, coupled with analysis and a deep understanding of the hardware design being inspected. Large Language Models (LLMs) have been used to assist during this task, either directly or in conjunction with existing tools. We improve the state of the art by proposing MARVEL, a multi-agent LLM framework for a unified approach to decision-making, tool use, and reasoning. MARVEL mimics the cognitive process of a designer looking for security vulnerabilities in RTL code. It consists of a supervisor agent that devises the security policy of the system-on-chips (SoCs) using its security documentation. It delegates tasks to validate the security policy to individual executor agents. Each executor agent carries out its assigned task using a particular strategy. Each executor agent may use one or more tools to identify potential security bugs in the design and send the results back to the supervisor agent for further analysis and confirmation. MARVEL includes executor agents that leverage formal tools, linters, simulation tests, LLM-based detection schemes, and static analysis-based checks. We test our approach on a known buggy SoC based on OpenTitan from the Hack@DATE competition. We find that 20 of the 48 issues reported by MARVEL pose security vulnerabilities.
- North America > United States > New York > Kings County > New York City (0.04)
- South America > Suriname > North Atlantic Ocean (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
Ye, Ziyang, Le, Triet Huynh Minh, Babar, M. Ali
Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generation, presents an opportunity to address this limitation. This study introduces LLMSecConfig, an innovative framework that bridges this gap by combining SATs with LLMs. Our approach leverages advanced prompting techniques and Retrieval-Augmented Generation (RAG) to automatically repair security misconfigurations while preserving operational functionality. Evaluation of 1,000 real-world Kubernetes configurations achieved a 94\% success rate while maintaining a low rate of introducing new misconfigurations. Our work makes a promising step towards automated container security management, reducing the manual effort required for configuration maintenance.
- Oceania > Australia > South Australia > Adelaide (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report (0.82)
- Overview (0.68)
Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation
Chong, Chun Jie, Yao, Zhihao, Neamtiu, Iulian
Generating code via a LLM (rather than writing code from scratch), has exploded in popularity. However, the security implications of LLM-generated code are still unknown. We performed a study that compared the security and quality of human-written code with that of LLM-generated code, for a wide range of programming tasks, including data structures, algorithms, cryptographic routines, and LeetCode questions. To assess code security we used unit testing, fuzzing, and static analysis. For code quality, we focused on complexity and size. We found that LLM can generate incorrect code that fails to implement the required functionality, especially for more complicated tasks; such errors can be subtle. For example, for the cryptographic algorithm SHA1, LLM generated an incorrect implementation that nevertheless compiles. In cases where its functionality was correct, we found that LLM-generated code is less secure, primarily due to the lack of defensive programming constructs, which invites a host of security issues such as buffer overflows or integer overflows. Fuzzing has revealed that LLM-generated code is more prone to hangs and crashes than human-written code. Quality-wise, we found that LLM generates bare-bones code that lacks defensive programming constructs, and is typically more complex (per line of code) compared to human-written code. Next, we constructed a feedback loop that asked the LLM to re-generate the code and eliminate the found issues (e.g., malloc overflow, array index out of bounds, null dereferences). We found that the LLM fails to eliminate such issues consistently: while succeeding in some cases, we found instances where the re-generated, supposedly more secure code, contains new issues; we also found that upon prompting, LLM can introduce issues in files that were issues-free before prompting.
- Europe (0.14)
- North America > United States > New Jersey (0.04)
The potential of LLM-generated reports in DevSecOps
Lykousas, Nikolaos, Argyropoulos, Vasileios, Casino, Fran
Alert fatigue is a common issue faced by software teams using the DevSecOps paradigm. The overwhelming number of warnings and alerts generated by security and code scanning tools, particularly in smaller teams where resources are limited, leads to desensitization and diminished responsiveness to security warnings, potentially exposing systems to vulnerabilities. This paper explores the potential of LLMs in generating actionable security reports that emphasize the financial impact and consequences of detected security issues, such as credential leaks, if they remain unaddressed. A survey conducted among developers indicates that LLM-generated reports significantly enhance the likelihood of immediate action on security issues by providing clear, comprehensive, and motivating insights. Integrating these reports into DevSecOps workflows can mitigate attention saturation and alert fatigue, ensuring that critical security warnings are addressed effectively.
- North America > United States (0.06)
- Europe > Spain > Catalonia (0.04)
- Europe > Spain > Balearic Islands > Mallorca > Palma (0.04)
- Europe > Romania (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Commercial Services & Supplies > Security & Alarm Services (0.95)
The Morning After: OpenAI's week of security issues
Perhaps unsurprisingly, July 4th was a quiet day for news, but we've still got editorials on e-ink writing, the most-delayed video game ever and more bad news from the makers of ChatGPT. Earlier this week, engineer and Swift developer Pedro José Pereira Vieito dug into OpenAI's Mac ChatGPT app and found that it was storing user conversations locally in plain text, rather than encrypting them. Because that app is only available from OpenAI's website, and since it's not available on the App Store, it doesn't have to follow Apple's sandboxing requirements. OpenAI released an update that added encryption to locally stored chats. Then, more bad news stemmed from issues in 2023. Last spring, a hacker obtained information about OpenAI after illicitly accessing the company's internal messaging systems.
- Law (1.00)
- Information Technology > Security & Privacy (0.54)
- Leisure & Entertainment > Games > Computer Games (0.40)
- Government > Regional Government > North America Government > United States Government (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering
Res, Jakub, Homoliak, Ivan, Perešíni, Martin, Smrčka, Aleš, Malinka, Kamil, Hanacek, Petr
AI assistants for coding are on the rise. However one of the reasons developers and companies avoid harnessing their full potential is the questionable security of the generated code. This paper first reviews the current state-of-the-art and identifies areas for improvement on this issue. Then, we propose a systematic approach based on prompt-altering methods to achieve better code security of (even proprietary black-box) AI-based code generators such as GitHub Copilot, while minimizing the complexity of the application from the user point-of-view, the computational resources, and operational costs. In sum, we propose and evaluate three prompt altering methods: (1) scenario-specific, (2) iterative, and (3) general clause, while we discuss their combination. Contrary to the audit of code security, the latter two of the proposed methods require no expert knowledge from the user. We assess the effectiveness of the proposed methods on the GitHub Copilot using the OpenVPN project in realistic scenarios, and we demonstrate that the proposed methods reduce the number of insecure generated code samples by up to 16\% and increase the number of secure code by up to 8\%. Since our approach does not require access to the internals of the AI models, it can be in general applied to any AI-based code synthesizer, not only GitHub Copilot.
- Europe > Czechia > South Moravian Region > Brno (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Can LLMs Patch Security Issues?
Alrashedy, Kamel, Aljasser, Abdullah
Large Language Models (LLMs) have shown impressive proficiency in code generation. Nonetheless, similar to human developers, these models might generate code that contains security vulnerabilities and flaws. Writing secure code remains a substantial challenge, as vulnerabilities often arise during interactions between programs and external systems or services, such as databases and operating systems. In this paper, we propose a novel approach, Feedback-Driven Solution Synthesis (FDSS), designed to explore the use of LLMs in receiving feedback from Bandit, which is a static code analysis tool, and then the LLMs generate potential solutions to resolve security vulnerabilities. Each solution, along with the vulnerable code, is then sent back to the LLM for code refinement. Our approach shows a significant improvement over the baseline and outperforms existing approaches. Furthermore, we introduce a new dataset, PythonSecurityEval, collected from real-world scenarios on Stack Overflow to evaluate the LLMs' ability to generate secure code. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)